Questions
- How can I create publication-quality graphics in R?
Objectives
To be able to use
ggplot2to generate publication quality graphics.To understand the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and coloring or panelling by groups.
Plotting the data is one of the best ways to quickly explore it and generate hypotheses about various relationships between variables.
There are several plotting systems in R, but today we will focus on ggplot2 which implements grammar of graphics - a coherent system for describing components that constitute visual representation of data. For more information regarding principles and thinking behind ggplot2 graphic system, please refer to Layered grammar of graphics by Hadley Wickham (@hadleywickham).
The advantage of ggplot2 is that it allows R users to create publication quality graphics with just a few lines of code. ggplot2 has a large user base and is constantly developed and extended by the community.
ggplot2 is a core member of tidyverse family of packages. Installing and loading the package under the same name will load all of the packages we will need for this workshop. Lets get started!
# install.packages("tidyverse")
# install.packages("gapminder")
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.0 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(gapminder)
If above code produces an error “there is no package called ‘tidyverse’”, uncomment (remove #) the line above and run install.packages()command before you load the library. You only need to install the package once, but you will have to reload it, using the library() command, every time you restart R.
Today we will be working with the gapminder dataset, which is the excerpt from the GAPMINDER data. Once gapminder package is loaded, data is already available to you.
You can have a look at the content of the gapminder data frame by simply typing gapminder either in the R-chunk or in the console. Data frame is a rectangular collection of data, where variables are organized as columns and observations are listed as rows.
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # … with 1,694 more rows
The dataset contains the following fields:
More information about the package and the data is available in help. Just type ?gapminder in console, located in the bottom panel of your RStudio, or type gapminder in the search field of the Help tab of the bottom-right RStudio panel. Whenever you are unsure about anything in R, it is a good idea to check out the help file using one of the two methods described above.
Here’s a question that we would like to answer using
gapminderdata: Do people in rich countries live longer than people in poor countries? The answer may be quite intuitive, but we will continue our investigation further: how does the relationship between GDP per capita and Life expectancy look like? Is this relationship linear? Non-linear? Are there exceptions to the general rule (outliers)?
To plot gapminder, run the following code in the R-chunk or in console. The following code will put gdpPercap on the x-axis and lifeExp on the y-axis:
ggplot(data = gapminder) +
geom_point(mapping = aes(x = gdpPercap, y = lifeExp))
Note that we split the function into two lines. The “plus” sign indicates that the function is not over yet and that the next line should be interpreted as additional layer to the preceding ggplot() function. In other words, when writing a ggplot() function spanning several lines, the + sign goes at the end of the line, not in the beginning.
The plot shows positive non-linear relationship between GDP per capita and Life expectancy.
Does this graph confirm or disprove your initial hypothesis about the relationship between these variables?
Note that in order to create a plot using ggplot2 system, you should start your command with ggplot() function. It creates an empty coordinate system and initializes the dataset to be used in the graph (which is supplied as a first argument into the ggplot() function). In order to create graphical representation of the data, we can add one or more layers to our otherwise empty graph. Functions starting with the prefix geom_ create a visual representation of data. In this case we added scattered points, using geom_point() function. There are many geoms in ggplot2, some of which we will learn in this lesson.
geom_ function create mapping of variables from the earlier defined dataset to certain aesthetical elements of the graph, such as axis, shapes or colors. The first argument of any geom_ function expects the user to specify these mappings, wrapped in the aes() (short for aesthetics) function. In this case, we mapped gdpPercap and lifeExp variables from gapminder dataset to x and y-axis, respectively (using x and y arguments of aes() function).
Generally speaking, the template for visualizing data in ggplot2 can be summarized as follows:
`ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))`
In the remainer of this lesson we will learn how to extend and complete this template using different elements to produce various visualizations. First, we will look closer at the <MAPPINGS> component.
- How did Life expectancy change over time? What do you observe? Note that many points are plotted on top of each other. This is called “overplotting”. Try a different
geom_function calledgeom_jitter. It will spread the points apart a little bit using random noise.Hint: the
gapminderdataset has a column calledyear, which should appear on the x-axis.
- See if you can visualize Life expectancy by continent. Which continent tends to have higher life expectancy (notice the density of the points along the y-axis)? Lowest life expectancy? Which continent has highest spread in life expectancy values? How about lowest spread?
## Part 1
ggplot(gapminder)+
geom_point(mapping = aes(x=year, y=lifeExp))
# fix overplotting
ggplot(gapminder)+
geom_jitter(mapping = aes(x=year, y=lifeExp))
## Part 2
ggplot(gapminder)+
geom_jitter(mapping = aes(x=continent, y=lifeExp))
What if we want to combine graphs from the previous two challenges and show the relationship between three variables in the same graph? Turns out, we don’t necessarily need to use third geometrical dimension, we can simply employ color.
The following graph maps continent variable from gapminder dataset to the color aesthetic of the plot. Let’s take a look:
ggplot(data = gapminder) +
geom_jitter(mapping = aes(x = year, y = lifeExp, color=continent))
What will happen if you switch the mappings of
continentandyearin the previous example? Is the graph still useful? Why? What if you mapcoloraesthetic tocountry? What has changed? How isyeardifferent fromcountry? What is the limitation of thecoloraesthetic, when used to visualize different types of data?Can you add a little color to our initial graph of life expectancy by GDP per capita? Color the points by continent. There seem to be some outliers in this graph. Can you now spot which continent to these points belong to? How about using color gradient to illustrate change over time?
Hint: you may want to transform GDP per capita to logarithmic scale before plotting. Just wrap the name of the variable into the
log()function
## Part 1
ggplot(data = gapminder) +
geom_jitter(mapping = aes(x = continent, y = lifeExp, color=year))
# Color by country
ggplot(data = gapminder) +
geom_jitter(mapping = aes(x = continent, y = lifeExp, color=country))
## Part 2
ggplot(data = gapminder) +
geom_jitter(mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent))
# change over time
ggplot(data = gapminder) +
geom_jitter(mapping = aes(x = log(gdpPercap), y = lifeExp, color=year))
There are other aesthetics that can come handy. One of them is size. This aestetic will vary the size of datapoints to illustrate another continuous variable, such as country population. Lets look at four dimensions at once!
ggplot(data = gapminder) +
geom_point(mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent, size=pop))
There’s one more useful aesthetic property of the graph which is good for visualizing low-cardinality categorical variables (categorical variables with small number of unique values), called shape. The idea is that you can employ different shapes (other than circles) to plot the data.
- Blow your mind by visualizing five(!) dimensions in the same graph. Modify the previous example mapping year to color and shape to continent. What can you say about those Asian outliers: do those belong to small or large countries? Are they from earlier or later time periods?
ggplot(data = gapminder) +
geom_point(mapping = aes(x = log(gdpPercap), y = lifeExp, color=year, shape=continent, size=pop))
Combining too many aesthetics in the same graph can make it quite busy. However, you can always remove certain aesthetic properties and use several graphs to highlight different aspects of data.
Until now, we explored different aesthetic properties of a graph mapped to certain variables. What if you want to recolor or use a particular shape to plot all datapoints? Well, that means that such color or shape will no longer be mapped to any data, so you need to supply it to geom_ function as a separate argument (outside of the mapping). Here’s our initial graph with all colors colored in blue.
ggplot(data = gapminder) +
geom_point(mapping = aes(x = gdpPercap, y = lifeExp), alpha=0.1, size=2, color="blue")
Note: This plot utilizes
alphaaestetic which varies “opacity” of datapoints from completely opaque (alpha=1) to completely transparent (alpha=0). Feel free to experiment with it, changing the transparency of the datapoints inside and outside the aesthetics. What can be the benefit of each one of these methods?
Once more, observe that in our example above, the color is not mapped to any particular variable from the gapminder dataset and applies equally to all datapoints, therefore it is outside the mapping argument and is not wrapped into aes() function. Note that unmapped colors are supplied as characters (in quotes), size is a number (size of point in mm) and shape is the ordinal index of the shape in R’s internal vocabulary (where square is 0, circle is 1, triangle is 2 and small filled circle is 20). Explore different shapes by varying the shape number between 0..25 or refer to ggplot2 documentation, called [vignettes] (http://docs.ggplot2.org/current/vignettes/ggplot2-specs.html), for details. This document can be also called from within R by calling vignette("ggplot2-specs").
Next, we will consider different options for ggplot2 template. Using different geom_ functions user can highlight different aspects of data. For example, we could connect individual datapoints belonging to the same country into a line and illustrate the development of life expectancy over time for each country separately using geom_line() function.
Some geom_ functions require additional aesthetics, such as aesthetic group in the geom_line() function. This aesthetic may not have any meaning in other geoms, but here it allows us to draw multiple lines, one per country. To keep the lines organized, we will color them by continent.
ggplot(data = gapminder) +
geom_line(mapping = aes(x = year, y = lifeExp, group=country, color=continent))
Note how life expectancy suddenly drops for certain countries for a short period of time. We will learn how to zoom in to those tragic periods of history and investigate which countries experienced them later in this workshop.
Another useful geom function is geom_boxplot(). It adds a layer with the “box and whiskers” plot illustrating the distribution of values within categories. The following chart breaks down life expectancy by continent, where the box represents first and third quartile (the 25th and 75th percentiles), the middle bar signifies the median value and the whiskers extend to cover 95% confidence interval. Outliers (outside of the 95% confidence interval range) are shown separately.
ggplot(data = gapminder) +
geom_boxplot(mapping = aes(x = continent, y = lifeExp))
Layers can be added on top of each other. In the following graph we will place the boxplots over jittered points to see the distribution of outliers more clearly. We can map two aesthetic properties to the same variable. Here we will use different color for each continent.
ggplot(data = gapminder) +
geom_jitter(mapping = aes(x = continent, y = lifeExp, color=continent)) +
geom_boxplot(mapping = aes(x = continent, y = lifeExp, color=continent))
Now, this was slightly inefficient due to duplication of code - we had to specify the same mappings for two layers. To avoid it, you can move common arguments of geom_ functions to the main ggplot() function. In this case every layer will “inherit” the same arguments, specified in the “parent” function.
ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp, color=continent)) +
geom_jitter() +
geom_boxplot()
You can still add layer-specific mappings or other arguments by specifying them within individual geoms. We would recommend building each layer separately and then moving common arguments up to the “parent” function (“first explicity then implicit”).
We can use linear models to highlight differences in relationships of GDP per capita and life expectancy by continent. Notice that we added a separate argument to the geom_smooth() function to specify the type of model we want ggplot2 to built using the data (in this case, a linear model). The geom_smooth() function has also helpfully provided confidence intervals, indicating “goodness of fit” for each model (shaded gray area). For more information on statistical models, please refer to help (by typing ?geom_smooth)
ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent)) +
geom_point(alpha=0.5) +
geom_smooth(method="lm")
Notice, that we also used a previously discussed visual property called alpha to increase transparency of the data points and make trend lines stand out. As you might remember, alpha property can also be used as a mapping aesthetic, i.e. transparency can be made to vary depending on the value of certain variable.
- Modify the graph above to force R to create single regression line for all data points. Keep the points colored by continent. Hint: There could be several alternative solutions to this problem
In the graph above, each geom inherited all three mappings: x, y and color. If we want only single linear model to be built, we would need to limit the effect of color aesthetic to only geom_point() function, by moving it from the “parent” function to the layer where we want it to apply. Note, though, that because we want the color to be still mapped to the continent variable, it needs to be wrapped into aes() function and supplied to mapping argument.
Alternative solution is just a “hack”, based on overriding the “inherited” color aestetic in the geom_smooth() layer. This solution works fine, but may be a little less easy to interpret what’s going on.
ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp)) +
geom_point(mapping=aes(color=continent), alpha=0.5) +
geom_smooth(method="lm")
# Alternative solution
ggplot(data = gapminder, mapping = aes(x = log(gdpPercap), y = lifeExp, color=continent)) +
geom_point(alpha=0.5) +
geom_smooth(method="lm", color="black")
As you can observe the x-axis label of our graph says log(gdpPercap), which indicates that we are not really plotting the original data, but rather the output of log() function. The same effect (with slightly more aesthetically pleasing x-axis label) can be achieved by specifying the x-axis scale transformation as a separate layer. Instead of transforming the values, we will transform the scale of x-axis.
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color=continent)) +
geom_point() +
geom_smooth() +
scale_x_log10()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Now the x-axis is measured in log10 units and the data, plotted on log10 scale looks more linear. Certain scale and coordinate functions may result in similar visual effects on the chart, but the way they interact with other aesthetic elements may be quite different. Check out the online ggplot2 documentation for more details and examples of using scale and coordinate transformations.
Make a boxplot of life expectancy by year. Hint: You may need to do something with the
yearvariable to force it to be categorical, or follow the advice suggested byggplot. When was interquartile range of life expectancy the smallest? Make the same plot ofgdpPercap(on a log scale) per year. Compared to 1952, is the world today more or less diverse in terms of IQR of GDP per capita?Make a histogram of untransformed and transformed
gdpPercap? Note, histogram requires you to specify only one variable, mapped to x aestetic. What is the shape of the distribution? Why is bin parameter important for interpretation of the histogram?Build a density function (also a univariate function). How would you compare density functions of different continents?
Based on graph produced using geom_density2d() function of log GDP per capita vs life expectancy, how many clusters of datapoints can you identify? What if you look at it by continent?
## Part 1
# force year to become categorical
ggplot(gapminder)+
geom_boxplot(mapping = aes(y=lifeExp, x=as.character(year))) # simple x=year will not work
# ggplot suggested solution
ggplot(gapminder)+
geom_boxplot(mapping = aes(y=lifeExp, x=year, group=year))
# gdpPercap
ggplot(gapminder)+
geom_boxplot(mapping = aes(y=gdpPercap, x=year, group=year))+
scale_y_log10()
## Part 2
ggplot(gapminder)+
geom_histogram(mapping = aes(x=gdpPercap))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# on log scale with higher number of bins
ggplot(gapminder)+
geom_histogram(mapping = aes(x=gdpPercap),bins=100) +
scale_x_log10()
## Part 3
# density
ggplot(gapminder)+
geom_density(mapping = aes(x=gdpPercap)) +
scale_x_log10()
# by continent
ggplot(gapminder)+
geom_density(mapping = aes(x=gdpPercap, color=continent)) +
scale_x_log10()
## Part 4
# Density 2d
ggplot(gapminder)+
geom_density2d(mapping = aes(x=gdpPercap, y=lifeExp)) +
scale_x_log10()
# by continent
ggplot(gapminder)+
geom_density2d(mapping = aes(x=gdpPercap, y=lifeExp, color=continent)) +
scale_x_log10()
Multi-layered graphs employing several aesthetics can look crowded. In order to avoid it, one can split the data into different graphs using panels of similar graphs. In ggplot2 this method is called “faceting”. Lets facet the graph above by continent and show the datapoints and the trend for each continent in a separate chart.
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_smooth() +
scale_x_log10() +
facet_wrap( vars(continent))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The facet_wrap() layer takes a vector of variables wrapped into the vars() function to specify that these should be interpreted in the context of the data. This tells R to draw a panel for each unique value in the continent column of the gapminder dataset. Faceting is useful when number of panels is limited. Notice that here R places panels from left to right, “wrapping” those panels that do not fit in one row onto the new line. Learn about advanced faceting, including faceting over several variables using help on ?facet_grid().
Note: In the code belonging to the older version of ggplot2 you may come across on the web, you will see “one-sided formula” specified inside facet-wrap(). Don’t panic, facet_wrap(~continent) is a perfectly valid code that still works in the modern version of the package.
Reiterating our previously proposed ggplot2 template and adding what we learned until, now we can state:
`ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) +
<FACET_FUNCTION>`
- Try faceting by year, keeping the linear smoother. Is there any change in slope of the linear trend over the years? What if you look at linear models per continent?
# by year
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_smooth(method = "lm") +
scale_x_log10() +
facet_wrap( vars(year))
# by continent
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
geom_smooth(method = "lm") +
scale_x_log10() +
facet_wrap( vars(continent))
Sometimes when plotting categorical variable on x-axis, bars end up too narrow and labels look unreadable. One way of dealing with it is to flip the coordinate system, i.e. plot the same data as horizontal bars. Let’s try to show population of every Asian country in 2007.
Note: this example requires
filter()function, which we have not yet studied. Hang on, it is coming at you very soon!
ggplot(filter(gapminder, year==2007, continent=="Asia")) +
geom_bar(mapping = aes(x=country, y=pop), stat="identity") +
coord_flip()
There are many function related to coordinate systems that allow, among other things, plotting in non-cartesian (e.g. polar and Mercator) coordinates and specifying manual limits for coordinate axis.
Lastly we will learn how to label and annotate the chart using labs and annotate functions.
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
geom_point() +
scale_x_log10() +
facet_wrap(vars(continent)) +
# Here comes the gganimate specific bits
labs(title="Life Expectancy vs GDP per capita over time",
subtitle="In the past 50 years, life expectancy has improved in most countries of the world",
caption="Source: Gapminder foundation, https://www.gapminder.org/data/",
x="GDP per capita, '000 USD",
y="Life expectancy, years",
color="Continent",
size="Population, mln")
The graph produced in the previous section looks quite good, but it requires a reader to follow the time aspect of the data by tracing the changes across panels. This may be better illustrated by “animating” the time dimension of the data and playing the twelve charts in front of user one after another.
# install.packages("gganimate")
# install.packages("gifski")
library(gganimate)
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
geom_point() +
scale_x_log10() +
facet_wrap(vars(continent)) +
# Here comes the gganimate specific bits
labs(title="Life Expectancy vs GDP per capita in {frame_time}",
subtitle="In the past 50 years, life expectancy has improved in most countries of the world",
caption="Source: Gapminder foundation, https://www.gapminder.org/data/",
x="GDP per capita, '000 USD",
y="Life expectancy, years",
color="Continent",
size="Population, mln") +
transition_time(year) +
ease_aes('linear')
We conclude this lesson by reiterating our ggplot2 data visualization template.
`ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>),
stat = <STAT>) +
<SCALE_FUNCTION> +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION> +
<LABS>`
We learned about seven parameters of ggplot functions. However, it is very rare that all six of them need to specified in a given graphic or chart. Most of the time ggplot offers useful defaults for everything other than data, geoms and mappings.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
Still bored?
- Use several graphs and necessary filters to narrow down your search to those few outliers with high gdpPercap. What are those countries and in which years? What might be the reason?
Hint: You may want to experiment with
geom_text()to get the country labels to show on the chart
- Use several graphs and necessary filters to narrow down your search to those few outliers with extraordinarily low life expectancy. What are those countries and in which years? What might be the reason?
When you are working on data from different countries, it might also be an idea to actually use maps to convey your data in a familiar way. ggplot2 has a new geom called geom_sf what will help you plot maps and use aethetics in the same way as in other geoms.
We have downloaded world data file from thematicmappin.org, called a shapefile and will use this to create maps. In this case, we use the entire folder that was downloaded as a source, and a package in R called sf know how to read this as a map coordinate system.
library(sf)
## Linking to GEOS 3.7.1, GDAL 2.4.2, PROJ 5.2.0
## WARNING: different compile-time and runtime versions for GEOS found:
## Linked against: 3.7.1-CAPI-1.11.1 27a5e771 compiled against: 3.7.0-CAPI-1.11.0
## It is probably a good idea to reinstall sf, and maybe rgeos and rgdal too
# install.packages("rnaturalearth")
#
# try plotting the world map
# world <- rnaturalearth::ne_countries(returnclass = "sf")
# ggplot() +
# geom_sf(data = world) +
# theme_bw()
world_map <- rnaturalearth::ne_countries(returnclass = "sf")
## Warning in fun(libname, pkgname): rgeos: versions of GEOS runtime 3.7.1-CAPI-1.11.1
## and GEOS at installation 3.7.0-CAPI-1.11.0differ
world_map
## Simple feature collection with 177 features and 63 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## scalerank featurecla labelrank sovereignt sov_a3 adm0_dif
## 0 1 Admin-0 country 3 Afghanistan AFG 0
## 1 1 Admin-0 country 3 Angola AGO 0
## 2 1 Admin-0 country 6 Albania ALB 0
## 3 1 Admin-0 country 4 United Arab Emirates ARE 0
## 4 1 Admin-0 country 2 Argentina ARG 0
## 5 1 Admin-0 country 6 Armenia ARM 0
## 6 1 Admin-0 country 4 Antarctica ATA 0
## 7 3 Admin-0 country 6 France FR1 1
## 8 1 Admin-0 country 2 Australia AU1 1
## 9 1 Admin-0 country 4 Austria AUT 0
## level type admin adm0_a3
## 0 2 Sovereign country Afghanistan AFG
## 1 2 Sovereign country Angola AGO
## 2 2 Sovereign country Albania ALB
## 3 2 Sovereign country United Arab Emirates ARE
## 4 2 Sovereign country Argentina ARG
## 5 2 Sovereign country Armenia ARM
## 6 2 Indeterminate Antarctica ATA
## 7 2 Dependency French Southern and Antarctic Lands ATF
## 8 2 Country Australia AUS
## 9 2 Sovereign country Austria AUT
## geou_dif geounit gu_a3 su_dif
## 0 0 Afghanistan AFG 0
## 1 0 Angola AGO 0
## 2 0 Albania ALB 0
## 3 0 United Arab Emirates ARE 0
## 4 0 Argentina ARG 0
## 5 0 Armenia ARM 0
## 6 0 Antarctica ATA 0
## 7 0 French Southern and Antarctic Lands ATF 0
## 8 0 Australia AUS 0
## 9 0 Austria AUT 0
## subunit su_a3 brk_diff
## 0 Afghanistan AFG 0
## 1 Angola AGO 0
## 2 Albania ALB 0
## 3 United Arab Emirates ARE 0
## 4 Argentina ARG 0
## 5 Armenia ARM 0
## 6 Antarctica ATA 0
## 7 French Southern and Antarctic Lands ATF 0
## 8 Australia AUS 0
## 9 Austria AUT 0
## name name_long brk_a3
## 0 Afghanistan Afghanistan AFG
## 1 Angola Angola AGO
## 2 Albania Albania ALB
## 3 United Arab Emirates United Arab Emirates ARE
## 4 Argentina Argentina ARG
## 5 Armenia Armenia ARM
## 6 Antarctica Antarctica ATA
## 7 Fr. S. Antarctic Lands French Southern and Antarctic Lands ATF
## 8 Australia Australia AUS
## 9 Austria Austria AUT
## brk_name brk_group abbrev postal
## 0 Afghanistan <NA> Afg. AF
## 1 Angola <NA> Ang. AO
## 2 Albania <NA> Alb. AL
## 3 United Arab Emirates <NA> U.A.E. AE
## 4 Argentina <NA> Arg. AR
## 5 Armenia <NA> Arm. ARM
## 6 Antarctica <NA> Ant. AQ
## 7 Fr. S. and Antarctic Lands <NA> Fr. S.A.L. TF
## 8 Australia <NA> Auz. AU
## 9 Austria <NA> Aust. A
## formal_en formal_fr note_adm0
## 0 Islamic State of Afghanistan <NA> <NA>
## 1 People's Republic of Angola <NA> <NA>
## 2 Republic of Albania <NA> <NA>
## 3 United Arab Emirates <NA> <NA>
## 4 Argentine Republic <NA> <NA>
## 5 Republic of Armenia <NA> <NA>
## 6 <NA> <NA> <NA>
## 7 Territory of the French Southern and Antarctic Lands <NA> Fr.
## 8 Commonwealth of Australia <NA> <NA>
## 9 Republic of Austria <NA> <NA>
## note_brk name_sort
## 0 <NA> Afghanistan
## 1 <NA> Angola
## 2 <NA> Albania
## 3 <NA> United Arab Emirates
## 4 <NA> Argentina
## 5 <NA> Armenia
## 6 Multiple claims held in abeyance Antarctica
## 7 <NA> French Southern and Antarctic Lands
## 8 <NA> Australia
## 9 <NA> Austria
## name_alt mapcolor7 mapcolor8 mapcolor9 mapcolor13 pop_est gdp_md_est
## 0 <NA> 5 6 8 7 28400000 22270.0
## 1 <NA> 3 2 6 1 12799293 110300.0
## 2 <NA> 1 4 1 6 3639453 21810.0
## 3 <NA> 2 1 3 3 4798491 184300.0
## 4 <NA> 3 1 3 13 40913584 573900.0
## 5 <NA> 3 1 2 10 2967004 18770.0
## 6 <NA> 4 5 1 NA 3802 760.4
## 7 <NA> 7 5 9 11 140 16.0
## 8 <NA> 1 2 2 7 21262641 800200.0
## 9 <NA> 3 1 3 4 8210281 329500.0
## pop_year lastcensus gdp_year economy
## 0 NA 1979 NA 7. Least developed region
## 1 NA 1970 NA 7. Least developed region
## 2 NA 2001 NA 6. Developing region
## 3 NA 2010 NA 6. Developing region
## 4 NA 2010 NA 5. Emerging region: G20
## 5 NA 2001 NA 6. Developing region
## 6 NA NA NA 6. Developing region
## 7 NA NA NA 6. Developing region
## 8 NA 2006 NA 2. Developed region: nonG7
## 9 NA 2011 NA 2. Developed region: nonG7
## income_grp wikipedia fips_10 iso_a2 iso_a3 iso_n3 un_a3
## 0 5. Low income NA <NA> AF AFG 004 004
## 1 3. Upper middle income NA <NA> AO AGO 024 024
## 2 4. Lower middle income NA <NA> AL ALB 008 008
## 3 2. High income: nonOECD NA <NA> AE ARE 784 784
## 4 3. Upper middle income NA <NA> AR ARG 032 032
## 5 4. Lower middle income NA <NA> AM ARM 051 051
## 6 2. High income: nonOECD NA <NA> AQ ATA 010 <NA>
## 7 2. High income: nonOECD NA <NA> TF ATF 260 <NA>
## 8 1. High income: OECD NA <NA> AU AUS 036 036
## 9 1. High income: OECD NA <NA> AT AUT 040 040
## wb_a2 wb_a3 woe_id adm0_a3_is adm0_a3_us adm0_a3_un adm0_a3_wb
## 0 AF AFG NA AFG AFG NA NA
## 1 AO AGO NA AGO AGO NA NA
## 2 AL ALB NA ALB ALB NA NA
## 3 AE ARE NA ARE ARE NA NA
## 4 AR ARG NA ARG ARG NA NA
## 5 AM ARM NA ARM ARM NA NA
## 6 <NA> <NA> NA ATA ATA NA NA
## 7 <NA> <NA> NA ATF ATF NA NA
## 8 AU AUS NA AUS AUS NA NA
## 9 AT AUT NA AUT AUT NA NA
## continent region_un
## 0 Asia Asia
## 1 Africa Africa
## 2 Europe Europe
## 3 Asia Asia
## 4 South America Americas
## 5 Asia Asia
## 6 Antarctica Antarctica
## 7 Seven seas (open ocean) Seven seas (open ocean)
## 8 Oceania Oceania
## 9 Europe Europe
## subregion region_wb name_len long_len
## 0 Southern Asia South Asia 11 11
## 1 Middle Africa Sub-Saharan Africa 6 6
## 2 Southern Europe Europe & Central Asia 7 7
## 3 Western Asia Middle East & North Africa 20 20
## 4 South America Latin America & Caribbean 9 9
## 5 Western Asia Europe & Central Asia 7 7
## 6 Antarctica Antarctica 10 10
## 7 Seven seas (open ocean) Sub-Saharan Africa 22 35
## 8 Australia and New Zealand East Asia & Pacific 9 9
## 9 Western Europe Europe & Central Asia 7 7
## abbrev_len tiny homepart geometry
## 0 4 NA 1 MULTIPOLYGON (((61.21082 35...
## 1 4 NA 1 MULTIPOLYGON (((16.32653 -5...
## 2 4 NA 1 MULTIPOLYGON (((20.59025 41...
## 3 6 NA 1 MULTIPOLYGON (((51.57952 24...
## 4 4 NA 1 MULTIPOLYGON (((-65.5 -55.2...
## 5 4 NA 1 MULTIPOLYGON (((43.58275 41...
## 6 4 NA 1 MULTIPOLYGON (((-59.57209 -...
## 7 10 2 NA MULTIPOLYGON (((68.935 -48....
## 8 4 NA 1 MULTIPOLYGON (((145.398 -40...
## 9 5 NA 1 MULTIPOLYGON (((16.97967 48...
ggplot(world_map) +
geom_sf(aes(fill = pop_est))+
scale_fill_viridis_c()+
coord_sf()+
theme_void()